Search CORE

101 research outputs found

Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi

Author: E-J Im
J Mellor-Crummey
M Krotkiewski
R Nishtala
Publication venue
Publication date: 05/02/2013
Field of study

Intel Xeon Phi is a recently released high-performance coprocessor which features 61 cores each supporting 4 hardware threads with 512-bit wide SIMD registers achieving a peak theoretical performance of 1Tflop/s in double precision. Many scientific applications involve operations on large sparse matrices such as linear solvers, eigensolver, and graph mining algorithms. The core of most of these applications involves the multiplication of a large, sparse matrix with a dense vector (SpMV). In this paper, we investigate the performance of the Xeon Phi coprocessor for SpMV. We first provide a comprehensive introduction to this new architecture and analyze its peak performance with a number of micro benchmarks. Although the design of a Xeon Phi core is not much different than those of the cores in modern processors, its large number of cores and hyperthreading capability allow many application to saturate the available memory bandwidth, which is not the case for many cutting-edge processors. Yet, our performance studies show that it is the memory latency not the bandwidth which creates a bottleneck for SpMV on this architecture. Finally, our experiments show that Xeon Phi's sparse kernel performance is very promising and even better than that of cutting-edge general purpose processors and GPUs

arXiv.org e-Print Archive

Crossref

Fastpass: A Centralized “Zero-Queue” Datacenter Network

Author: Al-Fares M.
Alizadeh M.
Alizadeh M.
Duato J.
Jeyakumar V.
Nishtala R.
Ohly P.
Shieh A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2014
Field of study

An ideal datacenter network should provide several properties, including low median and tail latency, high utilization (throughput), fair allocation of network resources between users or applications, deadline-aware scheduling, and congestion (loss) avoidance. Current datacenter networks inherit the principles that went into the design of the Internet, where packet transmission and path selection decisions are distributed among the endpoints and routers. Instead, we propose that each sender should delegate control—to a centralized arbiter—of when each packet should be transmitted and what path it should follow. This paper describes Fastpass, a datacenter network architecture built using this principle. Fastpass incorporates two fast algorithms: the first determines the time at which each packet should be transmitted, while the second determines the path to use for that packet. In addition, Fastpass uses an efficient protocol between the endpoints and the arbiter and an arbiter replication strategy for fault-tolerant failover. We deployed and evaluated Fastpass in a portion of Facebook’s datacenter network. Our results show that Fastpass achieves high throughput comparable to current networks at a 240 reduction is queue lengths (4.35 Mbytes reducing to 18 Kbytes), achieves much fairer and consistent flow throughputs than the baseline TCP (5200 reduction in the standard deviation of per-flow throughput with five concurrent connections), scalability from 1 to 8 cores in the arbiter implementation with the ability to schedule 2.21 Terabits/s of traffic in software on eight cores, and a 2.5 reduction in the number of TCP retransmissions in a latency-sensitive service at Facebook.National Science Foundation (U.S.) (grant IIS-1065219)Irwin Mark Jacobs and Joan Klein Jacobs Presidential FellowshipHertz Foundation (Fellowship

CiteSeerX

DSpace@MIT

Crossref

Estimated glucose disposal rate demographics and clinical characteristics of young adults with type 1 diabetes mellitus: A cross-sectional pilot study

Author: Ajjan RA
Karam M
Kietsiriroje N
Nishtala R
Pearson S
Publication venue: 'SAGE Publications'
Publication date: 01/09/2020
Field of study

Background: Estimated glucose disposal rate (eGDR) is a practical measure of Insulin Resistance (IR) which can be easily incorporated into clinical practice. We profiled eGDR in younger adults with type 1 diabetes mellitus (T1DM) by their demographic and clinical characteristics. Methods: In this single centre study, medical records of TIDM were assessed and eGDR tertiles correlated with demographic and clinical variables. Results: Of 175 T1DM individuals, 108 (61.7%) were males. Mean age (±SD) was 22.0 ± 1.6 years and median time from diagnosis 11.0 years (range 1–23). Individuals were predominantly Caucasian (81.7%), with 27.4% being overweight (BMI: 25–30 kg/m2) and 13.7% obese (BMI > 30 kg/m2). Mean total cholesterol (TC) levels were significantly lower in high and middle eGDR tertiles (4.4 ± 1 and 4.3 ± 0.8 mmol/l, respectively) compared with low eGDR tertile (4.8 ± 1, p < 0.05 for both). Triglyceride (TG) levels showed a similar trend at 1.1 ± 0.5 and 1.1 ± 0.5 mmol/l for high and middle eGDR tertile compared to low eGDR tertile (1.5 ± 1 mmol/l, p < 0.05 for both). Renal function was similar across eGDR tertiles and no difference in retinopathy was detected. Conclusion: TC and TG are altered in individuals with T1DM and low eGDR, suggesting that this subgroup requires optimal lipid management to ameliorate their vascular risk

White Rose Research Online

Block as a Value for SQL over NoSQL

Author: Abiteboul S.
Armbrust M.
Garey M.
Koutris P.
Nishtala R.
Petit J.
Ramakrishnan R.
Publication venue: 'VLDB Endowment'
Publication date: 01/06/2019
Field of study

Crossref

Edinburgh Research Explorer

Weitz

Author: John Davis
Keith A Weitz
Maria Zannes
Sherry Yarkosky
Subba R Nishtala
Susan A Thorneloe
Publication venue
Publication date: 24/04/2020
Field of study

ABSTRACT Technological advancements, environmental regulations, and emphasis on resource conservation and recovery have greatly reduced the environmental impacts of municipal solid waste (MSW) management, including emissions of greenhouse gases (GHGs). This study was conducted using a life-cycle methodology to track changes in GHG emissions during the past 25 years from the management of MSW in the United States. For the baseline year of 1974, MSW management consisted of limited recycling, combustion without energy recovery, and landfilling without gas collection or control. This was compared with data for 1980, 1990, and 1997, accounting for changes in MSW quantity, composition, management practices, and technology. Over time, the United States has moved toward increased recycling, composting, combustion (with energy recovery) and landfilling with gas recovery, control, and utilization. These changes were accounted for with historical data on MSW composition, quantities, management practices, and technological changes. Included in the analysis were the benefits of materials recycling and energy recovery to the extent that these displace virgin raw materials and fossil fuel electricity production, respectively. Carbon sinks associated with MSW management also were addressed. The results indicate that the MSW management actions taken by U.S. communities have significantly reduced potential GHG emissions despite an almost 2-fold increase in waste generation. GHG emissions from MSW management were estimated to be 36 million metric tons carbon equivalents (MMTCE) in 1974 and 8 MMTCE in 1997. If MSW were being managed today as it was in 1974, GHG emissions would be ~60 MMTCE. INTRODUCTION Solid waste management deals with the way resources are used as well as with end-of-life deposition of materials in the waste stream. 1 Often complex decisions are made regarding ways to collect, recycle, transport, and dispose of municipal solid waste (MSW) that affect cost and environmental releases. Prior to 1970, sanitary landfills were very rare. Wastes were "dumped" and organic materials in the dumps were burned to reduce volume. Waste incinerators with no pollution controls were common. 1 Today, solid waste management involves technologies that are more energy efficient and protective of human health and the environment. These technological changes and improvements are the result of decisions made by local communities and can impact residents directly. Selection of collection, transportation, recycling, treatment, and disposal systems can determine the number of recycling bins needed, the day people must place their garbage at the curb, the truck routes through residential streets, and the cost of waste services to households. Thus, MSW management can be a significant issue for municipalities. IMPLICATIONS Technology advancements and the movement toward integrated strategies for MSW management have resulted in reduced GHG emissions. GHG emissions from MSW management would be 52 MMTCE higher today if old strategies and technologies were still in use. Integrated strategies involving recycling, composting, waste-to-energy combustion, and landfills with gas collection and energy recovery play a significant role in reducing GHG emissions by recovering materials and energy from the MSW stream

CiteSeerX

Dynamic Performance Profiling of Cloud Caches

Author: Corbato F. J.
Dragojević A.
Fan B.
Hwang J.
Jiang S.
Kim J. M.
Lim H.
Megiddo N.
Nishtala R.
Storm A. J.
Suh G. E.
Yang T.
Zhao W.
Publication venue
Publication date: 01/01/2013
Field of study

Crossref

Royal Holloway - Pure

ESTIMA: Extrapolating ScalabiliTy of In-Memory Applications

Author: Akhiezer N. I.
Crovella M. E.
Fan B.
Intel I.
Jain R.
Lim H.
Minh C. C.
Nishtala R.
Olschanowsky C. R. M.
Vega A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/03/2016
Field of study

This paper presents ESTIMA, an easy-to-use tool for extrapolating the scalability of in-memory applications. ESTIMA is designed to perform a simple, yet important task: given the performance of an application on a small machine with a handful of cores, ESTIMA extrapolates its scalability to a larger machine with more cores, while requiring minimum input from the user. The key idea underlying ESTIMA is the use of stalled cycles (e.g. cycles that the processor spends waiting for various events, such as cache misses or waiting on a lock). ESTIMA measures stalled cycles on a few cores and extrapolates them to more cores, estimating the amount of waiting in the system. ESTIMA can be effectively used to predict the scalability of in-memory applications. For instance, using measurements of memcached and SQLite on a desktop machine, we obtain accurate predictions of their scalability on a server. Our extensive evaluation on a large number of in-memory benchmarks shows that ESTIMA has generally low prediction errors

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Speeding up distributed request-response workflows

Author: Ananthanarayanan G.
Andersen D. G.
Dean J.
Han D.
Nishtala R.
Ravindranath L.
Wang X. S.
Xu Y.
Zaharia M.
Zilberstein S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems

Author: Bronson N.
Cunha C.
Dragojevic A.
Hunt P.
Hwang J.
Lim H.
Linley Group
Linley Group
Mitchell C.
Nishtala R.
Ongaro D.
Ramasubramanian V.
Rumble S. M.
Stuedi P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/08/2016
Field of study

To provide low latency and high throughput guarantees, most large key-value stores keep the data in the memory of many servers. Despite the natural parallelism across lookups, the load imbalance, introduced by heavy skew in the popularity distribution of keys, limits performance. To avoid violating tail latency service-level objectives, systems tend to keep server utilization low and organize the data in micro-shards, which provides units of migration and replication for the purpose of load balancing. These techniques reduce the skew, but incur additional monitoring, data replication and consistency maintenance overheads. In this work, we introduce RackOut, a memory pooling technique that leverages the one-sided remote read primitive of emerging rack-scale systems to mitigate load imbalance while respecting service-level objectives. In RackOut, the data is aggregated at rack-scale granularity, with all of the participating servers in the rack jointly servicing all of the rack’s micro-shards. We develop a queuing model to evaluate the impact of RackOut at the datacenter scale. In addition, we implement a RackOut proof-of-concept key-value store, evaluate it on two experimental platforms based on RDMA and Scale-Out NUMA, and use these results to validate the model. Our results show that RackOut can increase throughput up to 6× for RDMA and 8.6× for Scale-Out NUMA compared to a scale-out deployment, while respecting tight tail latency service-level objectives

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Edinburgh Research Explorer